Conference Proceedings

Semi-Supervised Dialogue Policy Learning via Stochastic Reward Estimation

Xinting Huang, Jianzhong Qi, Yu Sun, Rui Zhang

58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020) | ASSOC COMPUTATIONAL LINGUISTICS-ACL | Published : 2020

Abstract

Dialogue policy optimization often obtains feedback until task completion in task oriented dialogue systems. This is insufficient for training intermediate dialogue turns since supervision signals (or rewards) are only provided at the end of dialogues. To address this issue, reward learning has been introduced to learn from state-action pairs of an optimal policy to provide turn-by-turn rewards. This approach requires complete state-action annotations of human-to-human dialogues (i.e., expert demonstrations), which is labor intensive. To overcome this limitation, we propose a novel reward learning approach for semi supervised policy learning. The proposed approach learns a dynamics model as ..

View full abstract

University of Melbourne Researchers

Grants

Awarded by Australian Research Council (ARC)


Funding Acknowledgements

We would like to thank Xiaojie Wang for his help. This work is supported by Australian Research Council (ARC) Discovery Project DP180102050, and China Scholarship Council (CSC).